AITopics | ibm research

Collaborating Authors

ibm research

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

15cf76466b97264765356fcc56d801d1-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 14:56:06 GMT

accuracy, ibm research, regularization strength, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Toward Continuous Neurocognitive Monitoring: Integrating Speech AI with Relational Graph Transformers for Rare Neurological Diseases

Norel, Raquel, Merler, Michele, Modi, Pavitra

arXiv.org Artificial IntelligenceDec-5-2025

Patients with rare neurological diseases report cognitive symptoms--"brain fog"--invisible to traditional tests. Proof-of-concept in phenylketonuria (PKU) shows speech-derived "Proficiency in Verbal Discourse" correlates Success would transform episodic neurology into continuous personalized monitoring for millions globally. In phenylketonuria (PKU), adults describe "brain fog" and working memory deficits [ We envision smartphone-based speech analysis integrated with medical databases via RELGT, enabling continuous neurocog-nitive monitoring--transforming reactive episodic care into proactive precision neurology. Parkinson's disease involves hypophonia and speech fluctuations tied to medication Huntington's disease reflects CAG-repeat-driven degrneration and progressive motor-cognitive decline. Wilson's disease presents with dysarthria linked to copper accumulation.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2512.04938

Country: North America > Canada > New Brunswick (0.29)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.71)

Add feedback

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

Candello, Heloisa, Azmat, Muneeza, Gunturi, Uma Sushmitha, Horesh, Raya, de Paula, Rogerio Abreu, Pimentel, Heloisa, Grave, Marcelo Carpinette, Adebiyi, Aminat, Machado, Tiago, de Macedo, Maysa Malfiza Garcia

arXiv.org Artificial IntelligenceDec-2-2025

With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.01892

Country: North America > United States > California (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Personal > Interview (0.93)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Solving Context Window Overflow in AI Agents

Labate, Anton Bulle, de Sousa, Valesca Moura, Fiorini, Sandro Rama, Azevedo, Leonardo Guerreiro, Thiago, Raphael Melo, da Silva, Viviane Torres

arXiv.org Artificial IntelligenceDec-1-2025

Large Language Models (LLMs) have become increasingly capable of interacting with external tools, granting access to specialized knowledge beyond their training data - critical in dynamic, knowledge-intensive domains such as Chemistry and Materials Science. However, large tool outputs can overflow the LLMs' context window, preventing task completion. Existing solutions such as truncation or summarization fail to preserve complete outputs, making them unsuitable for workflows requiring the full data. This work introduces a method that enables LLMs to process and utilize tool responses of arbitrary length without loss of information. By shifting the model's interaction from raw data to memory pointers, the method preserves tool functionality, allows seamless integration into agentic workflows, and reduces token usage and execution time. The proposed method is validated on a real-world Materials Science application that cannot be executed with conventional workflows, and its effectiveness is demonstrated via a comparative analysis where both methods succeed. In this experiment, the proposed approach consumed approximately seven times fewer tokens than the traditional workflow.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.22729

Country: Asia (0.28)

Genre: Workflow (1.00)

Industry: Information Technology (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Episodic Memory in Agentic Frameworks: Suggesting Next Tasks

Fiorini, Sandro Rama, Azevedo, Leonardo G., Thiago, Raphael M., de Sousa, Valesca M., Labate, Anton B., da Silva, Viviane Torres

arXiv.org Artificial IntelligenceNov-25-2025

Agentic frameworks powered by Large Language Models (LLMs) can be useful tools in scientific workflows by enabling human-AI co-creation. A key challenge is recommending the next steps during workflow creation without relying solely on LLMs, which risk hallucination and require fine-tuning with scarce proprietary data. We propose an episodic memory architecture that stores and retrieves past workflows to guide agents in suggesting plausible next tasks. By matching current workflows with historical sequences, agents can recommend steps based on prior patterns.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.17775

Country: Europe > Austria > Vienna (0.14)

Genre: Workflow (1.00)

Industry: Health & Medicine > Consumer Health (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Toward Cybersecurity-Expert Small Language Models

Levi, Matan, Ohayon, Daniel, Blobstein, Ariel, Sagi, Ravid, Molloy, Ian, Allouche, Yair

arXiv.org Artificial IntelligenceOct-17-2025

Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal 2.0, a family of cybersecurity-expert small language models (SLMs) ranging from 4B-20B parameters. To train CyberPal 2.0, we generate an enriched chain-of-thought cybersecurity instruction dataset built with our data enrichment and formatting pipeline, SecKnowledge 2.0, which integrates expert-in-the-loop steering of reasoning formats alongside LLM-driven multi-step grounding, yielding higher-fidelity, task-grounded reasoning traces for security tasks. Across diverse cybersecurity benchmarks, CyberPal 2.0 consistently outperforms its baselines and matches or surpasses various open and closed-source frontier models, while remaining a fraction of their size. On core cyber threat intelligence knowledge tasks, our models outperform almost all tested frontier models, ranking second only to Sec-Gemini v1. On core threat-investigation tasks, such as correlating vulnerabilities and bug tickets with weaknesses, our best 20B-parameter model outperforms GPT-4o, o1, o3-mini, and Sec-Gemini v1, ranking first, while our smallest 4B-parameter model ranks second.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.14113

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Statistical multi-metric evaluation and visualization of LLM system predictive performance

Ackerman, Samuel, Farchi, Eitan, Raz, Orna, Toledo, Assaf

arXiv.org Artificial IntelligenceJan-30-2025

The evaluation of generative or discriminative large language model (LLM)-based systems is often a complex multi-dimensional problem. Typically, a set of system configuration alternatives are evaluated on one or more benchmark datasets, each with one or more evaluation metrics, which may differ between datasets. We often want to evaluate -- with a statistical measure of significance -- whether systems perform differently either on a given dataset according to a single metric, on aggregate across metrics on a dataset, or across datasets. Such evaluations can be done to support decision-making, such as deciding whether a particular system component change (e.g., choice of LLM or hyperparameter values) significantly improves performance over the current system configuration, or, more generally, whether a fixed set of system configurations (e.g., a leaderboard list) have significantly different performances according to metrics of interest. We present a framework implementation that automatically performs the correct statistical tests, properly aggregates the statistical results across metrics and datasets (a nontrivial task), and can visualize the results. The framework is demonstrated on the multi-lingual code generation benchmark CrossCodeEval, for several state-of-the-art LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.18243

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.05)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network

Kishimoto, Akihiro, Kajino, Hiroshi, Hirose, Masataka, Fuchiwaki, Junta, Priyadarsini, Indra, Hamada, Lisa, Shinohara, Hajime, Nakano, Daiju, Takeda, Seiji

arXiv.org Artificial IntelligenceSep-28-2023

Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.

mhg-gnn, molecule, representation, (15 more...)

arXiv.org Artificial Intelligence

2309.16374

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)

Genre: Research Report (0.52)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Human-AI Co-Creation Approach to Find Forever Chemicals Replacements

Ferreira, Juliana Jansen, Segura, Vinícius, Souza, Joana G. R., Barbosa, Gabriel D. J., Gallas, João, Cerqueira, Renato, Zubarev, Dmitry

arXiv.org Artificial IntelligenceApr-11-2023

Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of subject matter experts to accelerate the material discovery. Our co-creation process starts with the interaction between the subject matter experts and a generative model that can generate new molecule designs. In this position paper, we discuss our hypothesis that these subject matter experts can benefit from a more iterative interaction with the generative model, asking for smaller samples and ``guiding'' the exploration of the discovery space with their knowledge.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.05389

Country:

North America > United States (0.30)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Red Hat and IBM Research Advance IT Automation with AI-Powered Capabilities for Ansible

#artificialintelligenceOct-19-2022, 12:40:10 GMT

Red Hat, the world's leading provider of open source solutions, and IBM Research announced Project Wisdom, the first community project to create an intelligent, natural language processing capability for Ansible and the IT automation industry. Using an artificial intelligence (AI) model, the project aims to boost the productivity of IT automation developers and make IT automation more achievable and understandable for diverse IT professionals with varied skills and backgrounds. According to a 2021 IDC prediction1, "by 2026, 85% of enterprises will combine human expertise with AI, ML, NLP, and pattern recognition to augment foresight across the organization, making workers 25% more productive and effective. Technologies such as machine learning, deep learning, natural language processing, pattern recognition, and knowledge graphs are producing increasingly accurate and context-aware insights, predictions, and recommendations." Project Wisdom – underpinned by AI foundation models derived from IBM's AI for Code efforts – works by enabling a user to input a command as a straightforward English sentence.

ai-powered capability, project wisdom, red hat, (13 more...)

#artificialintelligence

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Add feedback